Prosody Dependent Speech Recognition on Radio News

نویسنده

  • K. Chen
چکیده

Does prosody help word recognition? Humans listening to natural prosody, as opposed to monotone or foreign prosody, are able to understand the content with lower cognitive load and higher accuracy [1]. For automatic Large Vocabulary Continuous Speech Recognition (LVCSR), the answer is not that straightforward. Even though successful word recognition and successful prosody recognition have been demonstrated independently in many academic and commercial applications, no result has been reported in the literature that shows improved word recognition on a large-vocabulary continuous speech recognition task with the help of prosody. In 1997, Kompe [2] presented a theoretical proof stating that prosody can never improve word recognition accuracy unless the recognizer uses prosody dependent models. In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We propose the use of prosody-dependent allophones based on the “hidden mode variable” theory of Ostendorf et al [3], but with prosody dependence carefully restricted to a subset of distributions that are known to be most sensitive to prosodic context. Specifically, we propose to model prosody dependence of the phoneme duration probability density functions (PDFs), the acoustic-prosodic observation PDFs and the language model, and to ignore prosody dependence of the acoustic-phonetic observation PDFs. In so doing, we create effective models of the most striking and most often reported prosody-dependent allophonic variation, without significantly increasing the parameter count of the speech recognizer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Prosody Improves Word Recognition

Prosody has been traditionally regarded as useless for word recognition. In this paper, we provide a schematic view describing how prosody can help word recognition. We provide our view in terms of a Bayesian network that models the stochastic dependence among acoustic observation, word, prosody, syntax and meaning, and an information-theoretic analysis proving that the mutual information betwe...

متن کامل

Improving the Robustness of Prosody Dependent Language Modeling Based on Prosody Syntax Dependence

This paper presents a novel approach that improves the robustness of prosody dependent language modeling by leveraging the dependence between prosody and syntax. A prosody dependent language model describes the joint probability distribution of concurrent word and prosody sequences and can be used to provide prior language constraints in a prosody dependent speech recognizer. Robust Maximum Lik...

متن کامل

An Intonational Phrase Boundary and Pitch Accent Dependent Speech Recognizer

Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. We describe the idea of prosody dependent speech recognition by building a prosody dependent speech recognizer that conditions word and phoneme models on two important prosodic variables: intonational phrase bou...

متن کامل

Prosody Dependent Speech Reco Duration Modelling at Intonatio

Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. The prosody attribute that we investigate in this study is the lengthening of speech segments in the vicinity of intonational phrase boundaries. Explicit Duration Hidden Markov Model (EDHMM) is implemented to pr...

متن کامل

Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus q

This paper describes automatic speech recognition systems that satisfy two technological objectives. First, we seek to improve the automatic labeling of prosody, in order to aid future research in automatic speech understanding. Second, we seek to apply statistical speech recognition models of prosody for the purpose of reducing the word error rate of an automatic speech recognizer. The systems...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003